The data-set I will be working with for this project is the white wine quality data-set provided by Udacity. The following tables will show the thirteen variables within the white wine data-sets names (I will add one variable), the structure of the variable fields, and a quantile summary of each variable field. As I explore this data I will be focusing on one major question; what makes a quality bottle of wine?
The following code was used to create a Rating variable. I did this for better grouping of data and easier viewing. Anything of quality 3, 4, or 5 is assigned ‘Bad’. 6 is Average. 7 is Good. 8 is Great. 9 is Excellent.
# Add a variable named "Rating" and assign text.
wineQuality$rating <- ifelse(wineQuality$quality <= 5, 'Bad', ifelse(
wineQuality$quality < 7, 'Average', ifelse(
wineQuality$quality < 8, 'Good', ifelse(
wineQuality$quality < 9, 'Great', 'Excellent'))))
wineQuality$rating <- ordered(wineQuality$rating,
levels = c('Bad', 'Average', 'Good', 'Great',
'Excellent'))
The variables within our list and their names.
## [1] "X" "fixed.acidity" "volatile.acidity"
## [4] "citric.acid" "residual.sugar" "chlorides"
## [7] "free.sulfur.dioxide" "total.sulfur.dioxide" "density"
## [10] "pH" "sulphates" "alcohol"
## [13] "quality" "rating"
The structure of the data frame. We have 4,898 objects with 14 variables, or columns, with the respective variables type listed as well.
## 'data.frame': 4898 obs. of 14 variables:
## $ X : int 1 2 3 4 5 6 7 8 9 10 ...
## $ fixed.acidity : num 7 6.3 8.1 7.2 7.2 8.1 6.2 7 6.3 8.1 ...
## $ volatile.acidity : num 0.27 0.3 0.28 0.23 0.23 0.28 0.32 0.27 0.3 0.22 ...
## $ citric.acid : num 0.36 0.34 0.4 0.32 0.32 0.4 0.16 0.36 0.34 0.43 ...
## $ residual.sugar : num 20.7 1.6 6.9 8.5 8.5 6.9 7 20.7 1.6 1.5 ...
## $ chlorides : num 0.045 0.049 0.05 0.058 0.058 0.05 0.045 0.045 0.049 0.044 ...
## $ free.sulfur.dioxide : num 45 14 30 47 47 30 30 45 14 28 ...
## $ total.sulfur.dioxide: num 170 132 97 186 186 97 136 170 132 129 ...
## $ density : num 1.001 0.994 0.995 0.996 0.996 ...
## $ pH : num 3 3.3 3.26 3.19 3.19 3.26 3.18 3 3.3 3.22 ...
## $ sulphates : num 0.45 0.49 0.44 0.4 0.4 0.44 0.47 0.45 0.49 0.45 ...
## $ alcohol : num 8.8 9.5 10.1 9.9 9.9 10.1 9.6 8.8 9.5 11 ...
## $ quality : int 6 6 6 6 6 6 6 6 6 6 ...
## $ rating : Ord.factor w/ 5 levels "Bad"<"Average"<..: 2 2 2 2 2 2 2 2 2 2 ...
Below I have a summary of the respective variables shown as quantiles. This data can give us some clues as to what is happening withing our data. For example, under residual.sugar and free.sulfur.dioxide, I have some potential outliers. This can be seen by comparing the min and max value as it relates to the mean. Since these outliers have not pulled the mean to far away from the median, I can assume that any outliers I have do not misrepresent the data. As this is tidy, I fortunately have no NA’s.
## X fixed.acidity volatile.acidity citric.acid
## Min. : 1 Min. : 3.800 Min. :0.0800 Min. :0.0000
## 1st Qu.:1225 1st Qu.: 6.300 1st Qu.:0.2100 1st Qu.:0.2700
## Median :2450 Median : 6.800 Median :0.2600 Median :0.3200
## Mean :2450 Mean : 6.855 Mean :0.2782 Mean :0.3342
## 3rd Qu.:3674 3rd Qu.: 7.300 3rd Qu.:0.3200 3rd Qu.:0.3900
## Max. :4898 Max. :14.200 Max. :1.1000 Max. :1.6600
## residual.sugar chlorides free.sulfur.dioxide total.sulfur.dioxide
## Min. : 0.600 Min. :0.00900 Min. : 2.00 Min. : 9.0
## 1st Qu.: 1.700 1st Qu.:0.03600 1st Qu.: 23.00 1st Qu.:108.0
## Median : 5.200 Median :0.04300 Median : 34.00 Median :134.0
## Mean : 6.391 Mean :0.04577 Mean : 35.31 Mean :138.4
## 3rd Qu.: 9.900 3rd Qu.:0.05000 3rd Qu.: 46.00 3rd Qu.:167.0
## Max. :65.800 Max. :0.34600 Max. :289.00 Max. :440.0
## density pH sulphates alcohol
## Min. :0.9871 Min. :2.720 Min. :0.2200 Min. : 8.00
## 1st Qu.:0.9917 1st Qu.:3.090 1st Qu.:0.4100 1st Qu.: 9.50
## Median :0.9937 Median :3.180 Median :0.4700 Median :10.40
## Mean :0.9940 Mean :3.188 Mean :0.4898 Mean :10.51
## 3rd Qu.:0.9961 3rd Qu.:3.280 3rd Qu.:0.5500 3rd Qu.:11.40
## Max. :1.0390 Max. :3.820 Max. :1.0800 Max. :14.20
## quality rating
## Min. :3.000 Bad :1640
## 1st Qu.:5.000 Average :2198
## Median :6.000 Good : 880
## Mean :5.878 Great : 175
## 3rd Qu.:6.000 Excellent: 5
## Max. :9.000
The below table shows wine count based on rating. Most wines tested on an average of 6(Average) but there are less wines over 6(Average) than below 6(Average).
This table below shows I only have 180 wines that would be considered great or excellent. This may help isolate what makes a quality glass of wine moving forward.
##
## Bad Average Good Great Excellent
## 1640 2198 880 175 5
Exploring the quality by number of samples.
Exploring the alcohol content by number of samples.
Exploring free sulfur dioxide by number of samples.
Exploring total sulfur dioxide by number of samples.
This table represents the five types of 9(Excellent) quality white wines. It is interesting that only five wines earned such recognition out of almost 5000.
| X | fixed.acidity | volatile.acidity | citric.acid | residual.sugar | chlorides | free.sulfur.dioxide | total.sulfur.dioxide | density | pH | sulphates | alcohol | quality | rating | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 775 | 775 | 9.1 | 0.27 | 0.45 | 10.6 | 0.035 | 28 | 124 | 0.99700 | 3.20 | 0.46 | 10.4 | 9 | Excellent |
| 821 | 821 | 6.6 | 0.36 | 0.29 | 1.6 | 0.021 | 24 | 85 | 0.98965 | 3.41 | 0.61 | 12.4 | 9 | Excellent |
| 828 | 828 | 7.4 | 0.24 | 0.36 | 2.0 | 0.031 | 27 | 139 | 0.99055 | 3.28 | 0.48 | 12.5 | 9 | Excellent |
| 877 | 877 | 6.9 | 0.36 | 0.34 | 4.2 | 0.018 | 57 | 119 | 0.98980 | 3.28 | 0.36 | 12.7 | 9 | Excellent |
| 1606 | 1606 | 7.1 | 0.26 | 0.49 | 2.2 | 0.032 | 31 | 113 | 0.99030 | 3.37 | 0.42 | 12.9 | 9 | Excellent |
Exploring pH, I found that as quality rises mean and median of pH also rise while the range between min and max decreases.
## wineQuality$rating: Bad
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 2.79 3.08 3.16 3.17 3.24 3.79
## ------------------------------------------------------------
## wineQuality$rating: Average
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 2.720 3.080 3.180 3.189 3.280 3.810
## ------------------------------------------------------------
## wineQuality$rating: Good
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 2.840 3.100 3.200 3.214 3.320 3.820
## ------------------------------------------------------------
## wineQuality$rating: Great
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 2.940 3.120 3.230 3.219 3.330 3.590
## ------------------------------------------------------------
## wineQuality$rating: Excellent
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 3.200 3.280 3.280 3.308 3.370 3.410
This table of free.sulfur.dioxide shows some interesting numbers worthy of further investigation. An excellent glass of white wine’s min free sulfur dioxide is notably higher than all other areas. As well, the max is significantly lower than other rating areas. This tight value range may have the best clues, thus far, as to what makes and excellent glass of white wine.
## wineQuality$rating: Bad
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 2.00 20.00 34.00 35.34 49.00 289.00
## ------------------------------------------------------------
## wineQuality$rating: Average
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 3.00 24.00 34.00 35.65 46.00 112.00
## ------------------------------------------------------------
## wineQuality$rating: Good
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 5.00 25.00 33.00 34.13 41.00 108.00
## ------------------------------------------------------------
## wineQuality$rating: Great
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 6.00 28.00 35.00 36.72 44.50 105.00
## ------------------------------------------------------------
## wineQuality$rating: Excellent
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 24.0 27.0 28.0 33.4 31.0 57.0
This table of total sulfur dioxide as it relates to quality is interesting. Excellent glasses of white wine have a much higher min amount than others but with a smaller max amount than the other qualities as well. This range seems very specific to higher quality white wine.
## wineQuality$rating: Bad
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 9.0 117.0 149.0 148.6 182.0 440.0
## ------------------------------------------------------------
## wineQuality$rating: Average
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 18.0 107.2 132.0 137.0 164.0 294.0
## ------------------------------------------------------------
## wineQuality$rating: Good
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 34.0 101.0 122.0 125.1 144.2 229.0
## ------------------------------------------------------------
## wineQuality$rating: Great
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 59.0 102.5 122.0 126.2 150.0 212.5
## ------------------------------------------------------------
## wineQuality$rating: Excellent
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 85 113 119 116 124 139
I can see the same trend in this table of alcohol content, as quality rises, so does alcohol content. This is interesting because alcohol content is decided by other variables which are present, such as sugars, and I didn’t think one was supposed to swallow the wine when considering quality!
## wineQuality$rating: Bad
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 8.00 9.20 9.60 9.85 10.40 13.60
## ------------------------------------------------------------
## wineQuality$rating: Average
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 8.50 9.60 10.50 10.58 11.40 14.00
## ------------------------------------------------------------
## wineQuality$rating: Good
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 8.60 10.60 11.40 11.37 12.30 14.20
## ------------------------------------------------------------
## wineQuality$rating: Great
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 8.50 11.00 12.00 11.64 12.60 14.00
## ------------------------------------------------------------
## wineQuality$rating: Excellent
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 10.40 12.40 12.50 12.18 12.70 12.90
I have 4,898 objects with 14 variables, or columns. This is a data-set downloaded from Udacity and is extremely tidy. Each value in our variables is of type num except quality which in of type int. The “X” variable represents wine name and there is no intention to do any coding including this variable.
For this project I am interested in what defines a quality glass of wine. I can see that alcohol content, sulfur dioxide, pH, and density have interesting trends, but I will need to compare these variable further to determine their true interest.
I believe the variables that have the most dynamic range will help in isolating what variable makes a quality wine glass. Having potential outliers in several variable fields will be the starting point. The variables that have a very small range, such as density, may not yield as much meaningful insight. But, as I discovered while further exploring, the range of numbers is not an indicator of usefulness.
Yes, I created the variable “rating”. I found it useful to add a text phrase to the quality variable. I did this for better grouping of data by quality and for a better viewing experience. This made the data more meaningful versus numbers as a metric for quality.
The observed unusual distributions are in the residual.sugar, free.sulfur.dioxide, and total.sulfur.dioxide. I call these unusual due to how far off the max value is from the mean. Though they are unusual, as can be seen from the univariate charts, these variables seem to play some part in a high quality glass of white wine. I believe as we move forward, the answer will require the merging and viewing of several variable data points in a single chart to the relations of what makes a high quality glass of wine.
Scatterplot
Histograms of alcohol content as it relates to quality. Each chart is a quality/rating level from 3(Bad) to 9(Excellent) showing alcohol content and the portions for the respective quality.
Histograms of residual sugar content as it relates to quality/rating. Each chart has a quality level from 3(Bad) to 9(Excellent) showing residual sugar and the portions for the respective quality/rating. As I am interested in quality wine I have subsetted this chart according to quality/rating greater than 7(Average), or the mean of quality.
These histograms show the relation of free sulfur dioxide as it relates to wine quality. Each chart has a quality level from 3(Bad) to 9(Excellent) showing free sulfur dioxide and the portions for the respective quality. As I am interested in quality wine I have subsetted this chart according to quality/rating greater than 6(Average), or the mean of quality.
These histograms show the relation of total sulfur dioxide as it relates to wine quality. Each chart has a quality level from 3(Bad) to 9(Excellent6 showing total sulfur dioxide and the portions for the respective quality. As I am interested in quality wine I have subsetted this chart according to quality/rating greater than 6(Average), or the mean of quality.
This chart shows how the range of total sulfur dioxide shrinks as quality or rating increases.
As above, this chart shows the shrinking range of free sulfur dioxide as quality or rating increases.
An interesting trend is developing, as this pH table shows, the range of pH shrinks as quality or rating increases. This is reinforcing the thought that a fine median needs to be achieved for several variables to achieve quality.
Here we can see that as alcohol (%) content rises the quality trends up as well. The process for higher alcohol content must not be easy or we would see many more Bad rating wines attempt to make up for low quality with high alcohol. Since higher alcohol content is related to quality, it may be a good idea.
It was not surprising seeing the correlation between quality and alcohol content but it was surprising that alcohol content was the only variable that showed any possible correlation to quality outright. I suspect that quality is a product of more than two variables. Sulfur dioxide, both free and total, seem to play what may be the biggest part in white wines.
It was observed that density is closely related to total sulfur dioxide and residual sugar. Density, however, was not closely correlated with quality. This is interesting data points but seem to be only closely related to quality.
The strongest relationship was between residual sugar and density with a Pearson’s R of 0.8389665 and 95% confidence interval. This makes perfect sense. As with a cup of hot tea, the more sugar added even after dissolved, clouds the tea. Most white wines trended towards lower residual sugar levels but still maintained high density. This would be due to total sulfur dioxide also having a strong correlation (0.5298813) to density as well. So two variable work at changing the density variable of white wine.
Here, I create a new table grouped by rating and free sulfur dioxide while showing the mean and median of total sulfur dioxide.
wineQuality.tsd_by_rating_fsd <- wineQuality %>%
group_by(rating, free.sulfur.dioxide) %>%
summarise(mean_tsd = mean(total.sulfur.dioxide),
.groups = "drop",
median_tsd = median(total.sulfur.dioxide),
n = n()) %>%
arrange(rating)
The relationship between free and total sulfur dioxide is interesting. The mean and median total sulfur dioxide are equal while free sulfur dioxide also float in the perfect mean of free sulfur dioxide for and Excellent rating. As seen in the following tables, the Excellent rated wines are perfect with mean and median while within free sulfur dioxide range of 24 and 57. But, the other rated wines do not quite share these qualities. What is also interesting, take notice that if the mean and median of total sulfur dioxide are equal while within the free sulfur dioxide range of 24 and 57, the count is either 1 or 2.
| rating | free.sulfur.dioxide | mean_tsd | median_tsd | n |
|---|---|---|---|---|
| Average | 24.0 | 115.4194 | 112.5 | 62 |
| Average | 25.0 | 116.1509 | 113.0 | 53 |
| Average | 26.0 | 117.6232 | 111.0 | 69 |
| Average | 27.0 | 120.8511 | 119.0 | 47 |
| Average | 28.0 | 126.5098 | 122.0 | 51 |
| Average | 29.0 | 124.8361 | 122.0 | 61 |
| Average | 30.0 | 124.5714 | 117.0 | 49 |
| Average | 31.0 | 126.9627 | 117.0 | 67 |
| Average | 32.0 | 131.6852 | 121.5 | 54 |
| Average | 33.0 | 137.3788 | 136.0 | 66 |
| Average | 34.0 | 134.7547 | 132.0 | 53 |
| Average | 35.0 | 139.3333 | 138.0 | 45 |
| Average | 36.0 | 130.9296 | 122.0 | 71 |
| Average | 37.0 | 140.2241 | 130.0 | 58 |
| Average | 38.0 | 137.9216 | 131.0 | 51 |
| Average | 38.5 | 245.0000 | 245.0 | 1 |
| Average | 39.0 | 154.5581 | 151.0 | 43 |
| Average | 39.5 | 216.5000 | 216.5 | 1 |
| Average | 40.0 | 139.8611 | 141.0 | 36 |
| Average | 41.0 | 147.5455 | 142.5 | 44 |
| Average | 42.0 | 148.0408 | 146.0 | 49 |
| Average | 43.0 | 147.5714 | 142.5 | 28 |
| Average | 44.0 | 152.5278 | 161.0 | 36 |
| Average | 44.5 | 234.0000 | 234.0 | 2 |
| Average | 45.0 | 145.3056 | 140.5 | 36 |
| Average | 46.0 | 156.7826 | 149.0 | 23 |
| Average | 47.0 | 154.2553 | 156.0 | 47 |
| Average | 48.0 | 173.3448 | 180.0 | 29 |
| Average | 49.0 | 158.2051 | 159.0 | 39 |
| Average | 50.0 | 161.7000 | 156.0 | 30 |
| Average | 51.0 | 176.7667 | 170.0 | 30 |
| Average | 52.0 | 173.2432 | 177.0 | 37 |
| Average | 52.5 | 113.0000 | 113.0 | 2 |
| Average | 53.0 | 173.2973 | 166.0 | 37 |
| Average | 54.0 | 171.3636 | 167.0 | 22 |
| Average | 55.0 | 180.2857 | 190.0 | 21 |
| Average | 56.0 | 173.6154 | 177.0 | 26 |
| Average | 57.0 | 169.7727 | 168.0 | 22 |
| rating | free.sulfur.dioxide | mean_tsd | median_tsd | n |
|---|---|---|---|---|
| Good | 24.0 | 108.0625 | 99.5 | 16 |
| Good | 25.0 | 125.1724 | 133.0 | 29 |
| Good | 26.0 | 117.8214 | 114.0 | 28 |
| Good | 27.0 | 108.9444 | 100.5 | 18 |
| Good | 28.0 | 110.1739 | 110.0 | 23 |
| Good | 29.0 | 124.8909 | 121.0 | 55 |
| Good | 30.0 | 113.7727 | 112.5 | 22 |
| Good | 31.0 | 111.3333 | 113.0 | 27 |
| Good | 32.0 | 134.7083 | 136.0 | 24 |
| Good | 33.0 | 133.7778 | 125.0 | 27 |
| Good | 34.0 | 116.8077 | 114.5 | 26 |
| Good | 35.0 | 140.3750 | 128.0 | 40 |
| Good | 36.0 | 120.3913 | 122.0 | 23 |
| Good | 37.0 | 124.0000 | 110.5 | 20 |
| Good | 38.0 | 124.5455 | 119.5 | 22 |
| Good | 39.0 | 125.6500 | 122.5 | 20 |
| Good | 40.0 | 124.5294 | 119.0 | 34 |
| Good | 41.0 | 136.3929 | 142.0 | 28 |
| Good | 41.5 | 195.0000 | 195.0 | 2 |
| Good | 42.0 | 120.3846 | 120.0 | 13 |
| Good | 43.0 | 129.2222 | 132.0 | 9 |
| Good | 44.0 | 134.3000 | 136.0 | 10 |
| Good | 44.5 | 129.5000 | 129.5 | 2 |
| Good | 45.0 | 137.3913 | 138.0 | 23 |
| Good | 46.0 | 138.1176 | 143.0 | 17 |
| Good | 47.0 | 142.9286 | 134.5 | 14 |
| Good | 48.0 | 148.6000 | 139.5 | 10 |
| Good | 48.5 | 226.5714 | 229.0 | 7 |
| Good | 49.0 | 160.4615 | 164.0 | 13 |
| Good | 50.0 | 150.3333 | 149.0 | 9 |
| Good | 51.0 | 141.8333 | 130.5 | 6 |
| Good | 52.0 | 151.5000 | 158.0 | 6 |
| Good | 52.5 | 158.0000 | 158.0 | 1 |
| Good | 53.0 | 149.6667 | 143.0 | 6 |
| Good | 54.0 | 134.4000 | 128.0 | 5 |
| Good | 55.0 | 164.0909 | 149.0 | 11 |
| Good | 56.0 | 152.5000 | 152.5 | 2 |
| Good | 57.0 | 154.0000 | 156.0 | 3 |
| rating | free.sulfur.dioxide | mean_tsd | median_tsd | n |
|---|---|---|---|---|
| Great | 24 | 112.6667 | 125.0 | 3 |
| Great | 25 | 111.5000 | 111.5 | 2 |
| Great | 26 | 112.6667 | 109.0 | 3 |
| Great | 27 | 94.0000 | 94.0 | 2 |
| Great | 28 | 103.2500 | 97.0 | 4 |
| Great | 29 | 114.0000 | 118.0 | 11 |
| Great | 30 | 133.3750 | 117.0 | 8 |
| Great | 31 | 117.1429 | 119.0 | 7 |
| Great | 32 | 136.2500 | 136.5 | 4 |
| Great | 33 | 108.5000 | 108.5 | 2 |
| Great | 34 | 115.5000 | 112.0 | 8 |
| Great | 35 | 134.0000 | 135.0 | 3 |
| Great | 36 | 108.7500 | 113.5 | 4 |
| Great | 37 | 116.6000 | 122.0 | 10 |
| Great | 38 | 130.0000 | 132.0 | 4 |
| Great | 39 | 131.8333 | 123.5 | 6 |
| Great | 40 | 104.0000 | 104.0 | 1 |
| Great | 41 | 111.8000 | 98.0 | 5 |
| Great | 42 | 154.5000 | 154.5 | 2 |
| Great | 43 | 137.7778 | 145.0 | 9 |
| Great | 44 | 137.0000 | 137.0 | 1 |
| Great | 45 | 154.1111 | 155.0 | 9 |
| Great | 46 | 132.7500 | 132.5 | 4 |
| Great | 48 | 114.0000 | 114.0 | 1 |
| Great | 49 | 150.7500 | 144.5 | 4 |
| Great | 50 | 151.5000 | 151.5 | 2 |
| Great | 51 | 165.0000 | 165.0 | 1 |
| Great | 53 | 212.5000 | 212.5 | 6 |
| Great | 54 | 156.3333 | 155.0 | 3 |
| Great | 56 | 140.0000 | 140.0 | 2 |
| rating | free.sulfur.dioxide | mean_tsd | median_tsd | n |
|---|---|---|---|---|
| Excellent | 24 | 85 | 85 | 1 |
| Excellent | 27 | 139 | 139 | 1 |
| Excellent | 28 | 124 | 124 | 1 |
| Excellent | 31 | 113 | 113 | 1 |
| Excellent | 57 | 119 | 119 | 1 |
“In winemaking, the use of sulfur dioxide (SO2) is critical. We tend to talk a lot about free SO2 (FSO2) in particular, and not without good reason. The FSO2 and the pH of your wine determine how much SO2 is available in the active, molecular form to help protect the wine from oxidation and spoilage. FSO2 is also something we have to keep a close eye on, because it can be hard to predict how much will be lost, and at what rate, to binding or to aeration. Too much FSO2 can be perceptible to consumers, by masking the wine’s own fruity aromas and inhibiting its ability to undergo the cascade of oxygen-using reactions that happen when it “breathes,” or, in high enough concentrations, by contributing a sharp/bitter/metallic/chemical flavor or sensation."
Moroney, M. (2018, February 27). Total sulfur dioxide - why it matters, too! Retrieved March 04, 2021, from https://www.extension.iastate.edu/wine/total-sulfur-dioxide-why-it-matters-too
This confirms what I believe to be seeing but I do not believe to be the total story. Sulfur dioxide seems to set the tone for the senses, such as smell and taste but I also want to unlock what relation alcohol content plays in quality or if it’s something of a coincidence. Otherwise put, do the same qualities that make a white wine Great or Excellent, coincidentally also make the wine rich in alcohol content or do the folks in the sample just prefer to get “drunker, quicker”.
I also create a new data set for total and free sulfur dioxide by alcohol content.
wineQuality.tsd_by_alcohol <- wineQuality %>%
group_by(alcohol, free.sulfur.dioxide, rating) %>%
summarise(mean_tsd = mean(total.sulfur.dioxide),
.groups = "drop",
median_tsd = median(total.sulfur.dioxide),
n = n()) %>%
arrange(alcohol)
Here we also see the equal relationship between mean and median while the free sulfur dioxide also, mostly, floats within the tight range of 24 to 57, which is the Excellent quantile range for free sulfur dioxide examined earlier. Interestingly, just as there are only five 9(Excellent) rating wines, the highest alcohol content white wines share the same characteristics observed for 9(Excellent) white wines.
| alcohol | free.sulfur.dioxide | rating | mean_tsd | median_tsd | n |
|---|---|---|---|---|---|
| 14.00 | 12 | Average | 88 | 88 | 1 |
| 14.00 | 12 | Good | 120 | 120 | 1 |
| 14.00 | 33 | Good | 106 | 106 | 2 |
| 14.00 | 39 | Great | 150 | 150 | 1 |
| 14.05 | 31 | Good | 104 | 104 | 1 |
| 14.20 | 31 | Good | 113 | 113 | 1 |
Will examine the top four correlations.
Residual sugar and density.
##
## Pearson's product-moment correlation
##
## data: wineQuality$residual.sugar and wineQuality$density
## t = 107.87, df = 4896, p-value < 2.2e-16
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
## 0.8304732 0.8470698
## sample estimates:
## cor
## 0.8389665
Total sulfur dioxide and free sulfur dioxide.
##
## Pearson's product-moment correlation
##
## data: wineQuality$total.sulfur.dioxide and wineQuality$free.sulfur.dioxide
## t = 54.645, df = 4896, p-value < 2.2e-16
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
## 0.5977994 0.6326026
## sample estimates:
## cor
## 0.615501
Total sulfur dioxide and density.
##
## Pearson's product-moment correlation
##
## data: wineQuality$total.sulfur.dioxide and wineQuality$density
## t = 43.719, df = 4896, p-value < 2.2e-16
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
## 0.5094349 0.5497297
## sample estimates:
## cor
## 0.5298813
Alcohol and quality.
##
## Pearson's product-moment correlation
##
## data: wineQuality$alcohol and wineQuality$quality
## t = 33.858, df = 4896, p-value < 2.2e-16
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
## 0.4126015 0.4579941
## sample estimates:
## cor
## 0.4355747
This table attempts to show the relation between free and total sulfur dioxides while showing how the window for quality decreases to a smaller window through box-plots.
Showing the relation of total sulfur dioxide to alcohol content while showing quality ranges through boxplots. The amount of total sulfur dioxide is very specific when achieving quality.
Total sulfur dioxide and free sulfur dioxide are very key in creating an Excellent glass of wine. They are closely linked and there is a very tight range for perfection. Generally, total sulfur dioxide must be within 85 and 39. While free sulfur dioxide must be within 24 and 57. This only makes a Good glass. Excellence is achieved by hitting the median in both of those respective variables.
Alcohol content is interesting. I do not believe it to be a coincidence that alcohol is higher in better quality wine. This is a very delicate balance between chemical variables that requires masterful skill. I say this because of almost 5000 samples, only 5 made Excellent. If one can achieve that Excellent rating, then the chemical variables will interact perfectly, which will show in every aspect of the wine, including alcohol. I will speak more on this in reflection.
I found this interesting because this represents an avenue I did not venture down. I see the levels of pH change in a very noticeable way as quality goes up. This would make me think there is also an attachment of pH to quality as I see the range for 9(Excellent) is very small.
This chart reinforces the connection between total and free sulfur dioxide and how the ranges decrease as quality goes up. A very specific range, or balance, is needed of free and total sulfur dioxides to achieve quality. Even the outliers range shortens as quality rises creating less chances of achieving quality if the balance is not maintained.
This chart reinforced my thought that alcohol content is a by product of quality. This chart, I feel, embodies the complex chemistry it takes to make an excellent glass of white wine, as well. In making a “Bad” glass of wine, the chemical equation has broken down and it makes sense that the same chemical equation would not achieve higher alcohol content as alcohol content.
It occurred to me, where the reference is in the report, that I don’t know any where near enough about what exactly merits an excellent glass of wine. I Googled, I read, sulfur dioxide is important, as I found, but it’s the process that is really key. That process with time variables measured against these same chemical variables would, likely, yield more useful data in determining what makes quality. With that said, I think before one could have an understanding of how to make usefulness of this data, you need to understand much about the white wine making process.
Alcohol content was surprising. I was expecting to find more correlation than only to quality. Mostly I was expecting some link to residual sugar as I read sugars play a part in fermentation. My assumption is that as residual, it is not relating to alcohol. Perhaps if I knew how much was used in creation, I could find a correlation to alcohol content but residual sugar seems to be what was left over from the fermentation process. This also makes sense as to why residual sugar does correlate to density. I believe there may have been more interesting data points to explore here!